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Abstract — We study efficient importance sampling techniques 
for particle filtering (PF) when either (a) the observation like- 
lihood (OL) is frequently multimodal or heavy-tailed, or (b) 
the state space dimension is large or both. When the OL is 
multimodal, but the state transition pdf (STP) is narrow enough, 
the optimal importance density is usually unimodal. Under this 
assumption, many techniques have been proposed. But when 
the STP is broad, this assumption does not hold. We study 
how existing techniques can be generalized to situations where 
the optimal importance density is multimodal, but is unimodal 
conditioned on a part of the state vector. 

Sufficient conditions to test for the unimodality of this condi- 
tional posterior are derived. Our result is directly extendable to 
testing for unimodality of any posterior. 

The number of particles, N, to accurately track using a PF 
increases with state space dimension, thus making any regular 
PF impractical for large dimensional tracking problems. But 
in most such problems, most of the state change occurs in 
only a few dimensions, while the change in the rest of the 
dimensions is small. Using this property, we propose to replace 
importance sampling from a large part of the state space (whose 
conditional posterior is narrow enough) by just tracking the mode 
of the conditional posterior. This introduces some extra error, 
but it also greatly reduces the importance sampling dimension. 
The net effect is much smaller error for a given N, especially 
when the available N is small. An important class of large 
dimensional problems with multimodal OL is tracking spatially 
varying physical quantities such as temperature or pressure in a 
large area using a network of sensors which may be nonlinear 
and/or may have non-negligible failure probabilities. Improved 
performance of our proposed algorithms over existing PFs is 
demonstrated for this problem. 



I. Introduction 

Tracking is the problem of causally estimating a hidden 
state sequence, {X t }, from a sequence of noisy and possibly 
nonlinear observations, {Y t } that satisfy the Hidden Markov 
Model (HMM) assumption. A tracker recursively computes (or 
approximates) the "posterior" at time t, using the posterior at 
t— 1 and the current observation Y t . For nonlinear and/or non- 
Gaussian state space models, the posterior cannot be computed 
exactly. But, it can be efficiently approximated using a se- 
quential Monte Carlo method called particle filtering (PF) [3], 
[4], [5]. A PF outputs at each time t, a cloud of N weighted 
particles whose empirical measure closely approximates the 
true posterior for large N. A generic PF is summarized 
in Algorithm 1. There are two main issues in PF design: 
(a) choice of importance sampling density that reduces the 
variance of the particle weights and thus improves "effective 
particle size" [6] and (b) choice of resampling techniques 
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that improve effective particle size while not significantly 
increasing "particle impoverishment" [4]. Some solutions for 
(b) are [5, Ch. 13], [7], [8]. Our focus is on designing efficient 
importance densities and analyzing the assumptions under 
which they work, when either or both of the following occur: 

1) The observation likelihood (OL) is frequently multi- 
modal or heavy-tailed (or most generally, not strongly 
log-concave) as a function of the state and the state 
transition prior (STP) is broad. 

2) State space dimension is large (typically more than 10 or 
12). It is well known [3], [9] that the number of particles 
for a given tracking accuracy increases with state space 
dimension. This makes any regular PF impractical for 
large dimensional state spaces (LDSS). 

Definition 1 (Multimodal (or heavy -tailed) OL): refers to 
the OL, p(Y t \X t ), having multiple local maxima (or a heavy 
tail) as a function of the state, X t , for a given observation, Y t . 
An example is the observation model for the nonstationary 
growth model of [3]: Y t — X? + wt- Here, the OL is bimodal 
with modes at X t — ±\/Ft whenever Y t is significantly 
positive. Another example is the clutter model of [10]. 

Other examples are as follows. Consider tracking spatially 
varying temperature change using a network of sensors (see 
Example 1). Whenever one or more sensors fail (e.g. due to a 
large unmodeled disturbance or some other damage), the OL 
is often heavy-tailed or multimodal (see Fig. 1). The models 
of Example 1 are also similar to the commonly used clutter 
model in radar based target tracking applications or in contour 
tracking applications, e.g. Condensation [10], and to outlier 
noise models used in other visual tracking problems [11] or 
in aircraft navigation problems [9]. Another reason for OL 
multimodality is having a sensor that measures a nonlinear 
(many-to-one) function of the actual temperature. For e.g., 
the growth model of [3]. Another many-to-one example is 
when the observation is a product of functions of two subsets 
of states plus noise, for e.g. bearings-only tracking [3] or 
illumination and motion tracking [12], [13]. 

Note that even though our work was motivated by tracking 
problems with frequently multimodal OL, it is equally well 
applicable to any problem where the posterior is often multi- 
modal (e.g. due to nonlinearities in the system model), but is 
unimodal conditioned on a part of the state space. 

Large dimensional state spaces (LDSS) occur in tracking 
time-varying random fields, such as temperature or pressure, 
at a large number of nodes using a network of sensors [14], 
[15] (applications in environment monitoring and weather 
forecasting); in tracking AR parameters for noisy speech [16]; 
and in visual tracking problems such as tracking deforming 
contours [17], [18], [19], [11], tracking spatially varying 
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(a) Bimodal OL, Narrow STP: Unimodal p* (b) Bimodal OL, Broad STP: Bimodal p* (c) Heavy-tailed OL, Broad STP: Bimodal p* 

Fig. I. Demonstrating the effect of multimodal or heavy-tailed OL and broad STP for a M = 1 dimensional version of Example 1 with 
temperature independent failure. Xt is temperature. The STP is M{Xt-i,<j1 ys ), i.e. Example 1 with a — 0. Fig. 1(a): One out of J = 2 
sensors fails (bimodal OL) but narrow enough STP (a^ ys = 1). So p* is unimodal. Fig. 1(b): One out of J = 2 sensors fails (bimodal OL) 
and broad STP (a^ ys — 5). So p* is bimodal. Fig. 1(c): Estimating temperature but with J = 1 sensor and broad STP {cr 2 sys = 5). When 
the sensor fails, the OL is heavy-tailed and peaks at the wrong mode. Thus p* is bimodal with the wrong mode being the strong one. Note 
that the correct mode is so weak it may get missed in numerical computations. 



illumination change [12], [13] or tracking sets of "landmark" 
points [20]. In all of the above problems, at any time, "most 
state change " occurs in a small number of dimensions, while 
the change in the rest of the state space is small. We call 
this the "LDSS property. The LDSS property is related to, but 
different from, the assumption used by dimension reduction 
techniques such as Principal Components Analysis (PCA). If 
X t is a stationary large dimensional time series, or if X t 
projected along a large part of the state space is asymptotically 
stationary, PCA can be used for dimension reduction. Under 
a similar assumption, another PF has been recently proposed 
[21]. But if X t follows a random walk model (the increments, 
Xt—Xt-i, are stationary) in all dimensions, one cannot simply 
eliminate the low variance directions of X t —X t -i, or use [21]. 
This is because the variance of X t even along these directions 
will be significant as t increases. 

A generic PF is summarized in Algorithm 1. The most 
commonly used importance sampling density is the STP [3]. 
This assumes nothing and is easiest to implement. But since 
this does not use knowledge of the observation, the weights' 
variance can be large (particularly when the STP is broad 
compared to the OL), resulting in lower effective particle sizes 
[4]. The "optimal" importance density [6], i.e. one that min- 
imizes the variance of weights conditioned on past particles 
and observations until t, is the posterior conditioned on the 
previous state, denoted p*. When p* is unimodal (at least 
approximately), PF-Doucet [6] approximates it by a Gaussian 
about its mode (Laplace's approximation) and importance 
samples from the Gaussian. Laplace's approximation has also 
been used for approximating posteriors in different contexts 
earlier [22], [23], [24]. Other work in PF literature that also 
implicitly assumes thatp* is unimodal includes [4], [25], [26]. 
When the OL is multimodal, p* will be unimodal only if the 
STP is unimodal and narrow enough (see Fig. 1). In many 
situations, especially for LDSS problems, this does not hold. 
We develop the PF with Efficient IS (PF-EIS) algorithm to 
address such situations. PF-EIS assumes unimodality of p* 
conditioned on a few states which we call "multimodal states". 

Sufficient conditions to test for the unimodality of this 
conditional posterior are derived in Theorem 1. To the best 
of our knowledge, such a result has not been proved earlier. It 
is equally applicable to test for unimodality of any posterior. 



When in addition to multimodality, the state space space 
dimension is also large (typically more than 10 or 12), the 
number of particles required for reasonable accuracy is very 
large [3], [9] and this makes a regular PF impractical. One 
solution that partially addresses this issue is [5, Ch 13] or 
[7] which propose to resample more than once within a time 
interval. But more resampling results in more particle impov- 
erishment [4]. When the state space model is conditionally 
linear-Gaussian, or when many states can be vector quantized 
into a few discrete centers (need to know the centers a- 
priori), Rao Blackwellization (RB-PF) [27], [9] can be used. 
In general, neither assumption may hold. But when the LDSS 
property holds, it is possible to split the state space in such a 
way that the conditional posterior of a part of it is quite narrow, 
besides being unimodal. If it is narrow enough, importance 
sampling (IS) from this part of the state space can be replaced 
by just tracking the mode of the conditional posterior (mode 
tracking (MT)). The resulting algorithm is called PF-EIS-MT. 
MT introduces some extra error. But it greatly reduces the IS 
dimension. The net effect, is that a much smaller number of 
particles are required to achieve a given error, thus making PF 
practical for LDSS problems. 

In summary, our contributions are (a) two efficient algo- 
rithms for multimodal and large dimensional problems, (PF- 
EIS and PF-EIS-MT); and (b) a set of sufficient conditions 
to test for unimodality of the conditional posterior (Theorem 
1) and heuristics based on it to split the state space in the 
most efficient way. PF-EIS and Theorem 1 are derived in Sec. 
II. A generic LDSS model is introduced in Sec. III. Practical 
ways of choosing the "multimodal states" are discussed in 
Sec. IV. PF-EIS-MT and PF-MT are introduced in Sec. V. 
Relation to existing work is described in Sec. VI. In Sec. VII, 
we given extensive simulation results comparing our methods 
with existing work for the temperature field tracking problem. 
Conclusions and open issues are presented in Sec. VIII. 

II. PF-EIS: PF-Efficient Importance Sampling 

We denote the probability density function (pdf) of a 
random vector X, fx(X), using the notation p(X) and we 
denote the conditional pdf, /x|y(^|^)> by p(X\Y). Consider 
tracking a hidden sequence of states X t from a sequence of 
observations Y t which satisfy the HMM property: 
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Algorithm 1 Generic PF. Going from n£L 1 to Tr? (X t ) = 2~2iLi w^5(X t — XI) (note 8(X — a) denotes a Dirac delta function at a) 

A PF starts with sampling N times from tto at t = to approximate it by tt^(Xo). For each t > 0, it approximates the Bayes 
recursion for going from to tt^ using sequential importance sampling. This consists of the following 3 steps: 

1) Importance Sample (IS): For i = 1, 2...N, Sample X\ ~ q_{X\). The IS density, q, can depend on X\. t _ l , Y\. t . 

2) We/g/zf: For i = 1, 2.. .TV, compute the weights: tuj = /^ (3) , where = w j_ 1 ^H*)^HLi2. 

3) Resample: Replicate particles in proportion to their weights & reset w\ for all i [4]. Set t <— t + 1 & go to step 1. 



Assumption 1 (HMM): For each t, 

1) The dependence X t -\ — ► A t is Markovian, with state 
transition pdf (STP), p(X t |X t _i). 

2) Conditioned on X t , F t is independent of past and future 
states and observations. The observation likelihood (OL) 

is p(Y t \X t ). 

A generic particle filter (PF) is summarized in Algorithm 1. 

A. PF-EIS Algorithm 

Consider designing a PF for a given state space model. The 
optimal importance sampling density [6] is p(X t \Xl_ 1 ,Y t ) = 
p*(X t ). In most cases, this cannot be computed analyti- 
cally [6]. If p* is unimodal (at least approximately), [6] 
suggests approximating it by a Gaussian about its mode 
and sampling from it (Laplace's approximation [24]). But, 
when the OL is multimodal, or heavy-tailed, or otherwise 
not strongly log-concave, p* will be unimodal only if the 
STP is unimodal and narrow enough and the predicted state 
particle is near enough to an OL mode (see Fig. 1). In 
many situations, this may not hold in all dimensions. But in 
most such situations, the STP is broad and/or multimodal in 
only a few directions of the state space which we call the 
"multimodal" directions. It can be shown that if the STP is 
unimodal and narrow enough in the rest of the directions, 
p* will be unimodal conditioned on the "multimodal states" 
(Theorem 1). When this holds, we propose to split the state 
vector as X t = [X t}S ;Xt^} in such a way that X ttS contains 
the minimum number of dimensions for which p* is unimodal 
conditioned on it, i.e. 

p"'\X t , r )±p*{X t \Xi t ,)=p(X t , r \Xi_ 1 ,Xi ta ,YJ (1) 

is unimodal. We sample X t , s from its STP (to sample the pos- 
sibly multiple modes of p*), and use Laplace's approximation 
to approximate p**' % and sample X t . r from it, i.e. sample X\ r 
from M{m\,Y, % IS ) where 

ml = ml(XU,Xl s ,Y t ) 4 m inI?(X t , r ), where, 

Z} s 4 [(V^Xmj)]- 1 
L'iXtr) 4 - \og[p**'\X t . r )] + const (2) 

W 2 L l denotes the Hessian of U . The weighting step also 
changes to satisfy the principle of importance sampling. The 
complete algorithm is given in Algorithm 2. We call it PF 
with Efficient Importance Sampling (PF-EIS). As we shall see 
later, it is very expensive to exactly verify the unimodality 
conditions of Theorem 1. But even if X t . s is chosen so that 
p**' 1 is unimodal for most particles and at most times (i.e. is 



unimodal with high probability), the proposed algorithm works 
well. This can be seen from the simulation results of Sec. VII. 

B. Conditional Posterior Unimodality 

We derive sufficient conditions for unimodality of the condi- 
tional posterior, p**'\ Let dim(X M ) = K, dim(X tir ) = M r , 
dim(X t ) = M = K + M r . Because of the HMM structure, 

P**'\X t!r ) - Cp(^l4 s ,^t,rM^t,r|^-l,4 s ) O) 

where £ is a proportionality constant. 

Definition 2: We first define a few terms and symbols. 

1) The notation A > (A > 0) where A is a square matrix 
means that A is positive definite (positive semi-definite). 
Also, A > B (A > B) means A - B > (A - B > 0). 

2) The term "minimizer" refers to the unconstrained local 
minimizer of a function, i.e. a point xq s.t. f(xo) < f(x) 
V a; in its neighborhood. Similarly for "maximizer". 

3) A twice differentiable function, f(x), is strongly convex 
in a region 1Z, if there exists an to > s.t. at all points, 
x e 1Z, the Hessian V 2 /(x) > ml. If / is strongly 
convex in 1Z, it has at most one minimizer in 1Z and it 
lies in the interior of 1Z. If / is strongly-convex on R M , 
then it has exactly one (finite) minimizer. 

4) A function is strongly log-concave if its negative log is 
strongly convex. An example is a Gaussian pdf. 

5) Since a pdf is an integrable function, it will always have 
at least one (finite) maximizer. Thus a pdf having at most 
one maximizer is equivalent to it being unimodal. 

6) The symbol E[.] denotes expected value. 

7) We denote the — log of OL using the symbol E Yt , i.e. 

E Yt {X t ) 4 - \ogp(Y t \X t ) + const (4) 

8) We denote the - log of the STP of X t<r as 

D\X ttr ) 4 -logp(X t , r X*J + const (5) 

9) When the STP of X t , r is strongly log-concave (assumed 
in Theorem 1), we denote its unique mode by 

/; 4 f r (XU,Xl s ) = argmaxp(X^|^_ 1 ,XL) (6) 

10) [z] p or z p denotes the p th coordinate of a vector, z. 

11) maxp is often used in place of max p= i, 2 ,.. .Mr- 
Combining (3), (4) and (5), L % (X t ^ r ) can be written as 

L\X t . r ) = Ey t (Xl a ,X t , r ) + D\X t , r ) (7) 

Now, p**' l (X t , r ) will be unimodal if and only if we can show 
that L l has at most one minimizer. We derive a set of sufficient 
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Algorithm 2 PF-EIS. Going from tt^ to ir?(X t ) = Eti w^^Xt - XX), Xj = [■#,., X|, r ] 

1) Importance Sample X t . s : Vi, sample s <~ a |-3Q_i). 

2) Efficient Importance Sample X t . r : Vi, sample X£ r <~ M{X\ r \m\, Here mJpQ^, X\ s , Y t ) = 
argmin Xt , r L^X^) and £} s = (V^^mj))" 1 and L* is defined in (7). 

3) Weight: ^, compute = *U where u>« = "^^^'"Vf ~ where = [ X ls> X U 

4) Resample [4]. Set i <— t + 1 & go to step 1. 



conditions on £V t , £> 1 and /* to ensure this. The main idea is 
as follows. We assume strong log-concavity (e.g. Gaussianity) 
of the STP of X t .r- Thus D t (X t , r ) will be strongly convex 
with a unique minimizer at /*. But E Yt (X t ) (and so E Yt as a 
function of X t , r ) can have multiple minimizers since OL can 
be multimodal. Assume that E Yt (X\ s , X t . r ) is locally convex 
in the neighborhood of /* (this will hold if /* is close enough 
to any of its minimizers). Denote this region by TZlc- Thus, 
inside TZlc, L l will be strongly convex and hence it will 
have at most one minimizer. We show that if max p |[VD] P | is 
large enough outside TZlc (the spread of the STP of X t<r is 
small enough), L 1 will have no stationary points (and hence 
no minimizers) outside TZlc or on its boundary. 

This idea leads to Theorem 1 below. Its first condition 
ensures strong convexity of D l everywhere. The second one 
ensures that TZlc exists. The third one ensures that 3 an 
eo > 0, s.t. at all points in TZ C LC (complement of TZlc), 
maxp |[VL*]p| > eo (i.e. U has no stationary points in TZ C LC ). 

Theorem 1: p**' i (X ttr ) is unimodal with the unique mode 
lying inside TZlc if Assumption 1 and the following hold: 

1) The STP of X t . r , piXt^X^XlJ, is strongly log- 
concave. Its unique mode is denoted by /*. 

2) The - log of OL given X\ s , E Yt (X\ a ,X t .r) is twice 
continuously differentiable almost everywhere and is 
locally convex in the neighborhood of /*. Let TZlc Q 
M. Mr denote the largest convex region in the neighbor- 
hood of /; where V^ t r £V t (^, s , X t>r ) > (E Yt as a 
function of X t<r is locally convex). 

3) There exists an e > such that 

inf max [7 P (^t,r)] > 1 (8) 



where 



lp{X t .r) 



|[V-P'1pI 
eo + |[VB n ] P |- 



if X t , r e Ap 



l[V-P']pl if y c 57 

eo-UVBvJ,,!' lJ At > r G 



(9) 



A p 4 {X t , r G TZlc ■ [VD%.[VE Yt ) p < 0} 

Z p = {X t .r € TZlc '■ 

[VE Yt ] p .[VD% > & \[VE Yt ] p \ < e } (10) 

VE Yt ±Vx t , r E Yt (Xi,.,Xt,r) 

VD^V Xt ^D\X t .r) (11) 



Proof: In the proof, V is used to denote Vx t , r - Also, 
we remove the superscripts from U and D l . p**- l (X t .r) will 
be unimodal iff L defined in (7) has at most one minimizer. 



We obtain sufficient conditions for this. Condition 1) ensures 
that D is strongly convex everywhere with a unique minimizer 
at /*. Condition 2) ensures that TZlc exists. By definition of 
TZlc, E Yt is convex inside it. Thus the first two conditions 
ensure that L is strongly convex inside TZlc- So it has at most 
one minimizer inside TZlc- 

We now show that if condition 3) also holds, L will have 
no stationary points (and hence no minimizers) in TZ C LC or on 
its boundary. A sufficient condition for this is: 3 eo > s.t. 

max |[VL] P | > e , VX t , r e TZ C LC (12) 

p 

We show that condition 3) is sufficient to ensure (12). Note 
that VL = \7E Yt + VD. In the regions where for at least one 
p, [VE Yt ] p .[VD]p > (have same sign) and |[V£VJ P | > e , 
condition (12) will always hold. Thus we only need to worry 
about regions where, for all p, either [V E Yt ] p .[V D] p < or 
[V E Yt ]p.[V D] p > but |[V£VJ P | < e . This is the region 

^pMAp U Z p ) 4 Q, A p , Z p defined in (10) (13) 

Now, D only has one stationary point which is /* and it lies 
inside TZlc (by definition of TZlc), an d none in TZ C LC . Thus 
VD ^ in TZ C LC and, in particular, inside Q C TZ C LC . Thus 
if we can find a condition which ensures that, for all points 
in Q, for at least one p, [VL] p "follows the sign of [V-D] p " 
(i.e. [VL] p > e where [V£)] p > and [WL] P < -e where 
[VD] P < 0), we will be done. 

We first find the required condition for a given p and a point 
Xt, r € Q- For any p, if X t . r € Q, then it either belongs to A p 
or belongs to Z p . If X tj7 . £ A p , \[VL] P \ > eo if 



> 1 



(14) 



e + \[VE Yt ] p \ 

This is obtained by combining the conditions for the case 
[V.D]p > and the case [VD] P < 0. Proceeding in a similar 
fashion, if X t . r £ Z p , |[VL] P | > e if 

|[V£>] P | 



> 1 



(15) 



eo-|[V£Vj P | 

Inequalities (14) and (15) can be combined and rewritten 
as j p (Xt,r) — 1 > where 7 P is defined in (9). For (12) 
to hold, we need |[VL] P | > eo for at least one p, for all 
Xt, r € Q. This will happen if infx t r £g max p 7 p (X tiT .) > 1. 
But this is condition 3. Thus condition 3) implies that L has 
no minimizers in TZ C LC . Thus if conditions 1), 2) and 3) of the 
theorem hold, L has at most one minimizer which lies inside 
TZlc- Thus p**' t (X t .r) has a unique mode which lies inside 
TZlc, i- e - it is unimodal. ■ 

The most common example of a strongly log-concave pdf is 
a Gaussian. When the STP of X t . r is Gaussian with mean (= 
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mode) /*, the above result can be further simplified to get an 
upper bound on the eigenvalues of its covariance matrix. First 
consider the case when the covariance is diagonal, denoted A r . 
In this case, D*(X t , r ) = £ p {[Xt ^J^ and so [WD% = 
[Xt,T-f r ]p _ gy substituting this in condition 3), it is easy to 
see that we get the following simplified condition: 

inf max[ 7 "" m (X t , r ) - A r , p ] > (16) 

eo + |[V£y t ] p |> A *>r G ^ 

(17) 

[Xt,r — fr]p SfY C 7 

eo-|[VByJp|' ^ A *<'' G 

^ 4 G : [X t , r - /;] P .[VSyJ p < 0} 

-Zp = {X t .r € T^iC : 

[V£y t ] p .LY t , r - /*]„ > & \[VE Yt ] P \ < e } (18) 

Also, since max p Lgi(p) - 02 (p)] > maxp^p) -max p g 2 (p) 
for any two functions, 51, g 2 , a sufficient condition for (16) is 

maxA rp < inf max[ 7 ™ um (X t r )] 4 A* (19) 

Thus, we have the following corollary. 

Corollary 1: When the STP of Xt. r is Gaussian with mean 
fl and diagonal covariance, A r , p**' 4 (A" t;r ) is unimodal if 
(a) condition 2) of Theorem 1 holds and (b) there exists an 
e > s.t. (16) holds with 7™ 1 "™ defined in (17) and A p ,Z p 
defined in (18). A sufficient condition for (16) is (19). 

Now consider the case when the STP of X t r is Gaus- 
sian with non-diagonal covariance, S r = UA r lI T . Define 
Xt, r — U T X t .r- Since X t , r is a one-to-one and linear function 
of X ttr , it is easy to see that p**' z (X tt1 .) is unimodal iff 
p**^(X t , r ) = piXt^Xl^Xi^Yt) is unimodal. The STP 
of X t , r is JV{U T f l rl A r ). Also, its OL is p{Y t \X l ts ,UX^ r ). 
Define E Yt (X t}7 .) ± E Yt (UX t}7 .). 

Corollary 2: When the STP of X t . r is Gaussian with mean 
/* and non-diagonal covariance, S r = UA r U T , p**' 4 (X t>r ) 
is unimodal if the conditions of Corollary 1 hold with Ey t 
replaced by E Yt ; p r replaced by U 1 p r and X t:V replaced by 
X tir everywhere. 

To summarize the above discussion, p**' 1 is unimodal if 

1) The STP of Xt tr is strongly log-concave (e.g. Gaussian), 

2) The mode of the STP of X t . r is "close enough" to 
a mode of [OL given X% s ], so that condition 2) of 
Theorem 1 holds. Denote this mode by X*. 

3) The maximum spread of the STP of X t ^ r is "small 
enough" to ensure that condition 3) of Theorem 1 holds. 
In the Gaussian STP case, this translates to the maximum 
eigenvalue of its covariance being smaller than A*, 
defined in (19). A* itself is directly proportional to the 
distance of X* to the next nearest mode of [OL given 
X\ J and inversely proportional to its strength. 

The last two conditions above automatically hold if [OL given 
X\ s ] is strongly log-concave (7l c LC is empty and so A* = 00). 



III. A Generic State Space Model for LDSS 

For many problems, and, in particular, for many large 
dimensional state space (LDSS) problems, the state space 
model can be expressed as follows with X t — [Ct,v t ] (a 
generalization of the constant velocity motion model): 

Y t = h c ,w(Ct,w t ), w t ~p w (.) 

C t - C t _i +g Ct . 1 (Bv t ), B 4 B(C t _i) 

vt = fv(vt-i) + vt, vt ~ Af(0, A„), A„ diagonal (20) 

The noises v t , w t are independent of each other and over time. 
If hc, w is one-to-one as a function of w t , and its inverse is 
denoted by g(C t ,Y t ), the OL can be written as 

p{Y t \Ct)=p w {g{C u Y t )) (21) 

Then its -log, E Yt (C t ) = - log p w {g{C t ,Y t )). In cer- 
tain problems, it is easier to directly specify p(Y t \Ct) = 
(3exp[—E Yt (Ct)]. In the above model, C t denotes the LDSS 
quantity of interest, for e.g. it may denote the M contour point 
locations or it may denote temperature (or any other physical 
quantity) at M sensor nodes. The quantity V t = Bv t often 
denotes the time "derivative" of C t and is assumed to follow a 
first order Markov model. If Ct belongs to a smooth manifold 
S, then Vt belongs to the tangent space to S at C t . gc(V) 
denotes the mapping from the tangent space at C to S, while 
if S is a vector space, then gc{V) = V. In this work, we 
only study the vector space case. We develop the same ideas 
for the space of contours (a smooth manifold) in [11]. Related 
work on defining AR models for smooth manifolds is [28]. 

Note that in the above model, the system noise dimension 
(and hence the importance sampling dimension) is M = 
dim(i/ t ) = dim(f t ), and not 2M, and this is what governs 
the number of particles required for a given accuracy. 

We discuss some LDSS examples below. 

Example 1 (Temperature Tracking): Consider tracking 
temperature at M locations using a network of sensors. Here 
S is a vector space and so gc{V) = V- Let C t . p denotes 
temperature at location p, p — 1, . . . M and Vt. v denote the 
first derivative of temperature at node p. Vt is assumed to 
be zero mean and its dynamics can be modeled by a linear 
Gauss Markov model (as also in [14]), i.e. 

C t - C t -i + V, V t = A v Vt-i + n t , n t ~ 7V(0, £„) (22) 

Since Vt is usually spatially correlated, E„ may not be 
diagonal. Let the eigenvalue decomposition of S„ is S„ = 
BA V B T . Define v t = B T V t , v t = B T n t , f v (v) = B T A v Bv 
and gc(V) = V. For simplicity, we use Ay = al and so 
f v (v) = B T AyBv = av. With f v (v) = av, B is also the 
eigenvector matrix of the covariance of V t . Then (22) can be 
rewritten in the form (20) as 

C t =C t -i + Bv t 

v t = avt-i + v u v t - Af(0, A v ) (23) 

Temperature at each node, p, is measured using J (J = 1 
or 2) sensors that have failure probabilities a p \j = 1,2. 
Note that there may actually be two sensors at a node, or two 
nearby sensors can be combined and treated as one "node" 
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for tracking purposes. Failure of the JM sensors is assumed 

to be independent of each other and over time. If a sensor 

(i) 

is working, the observation, Y t \ p , is the actual temperature, 
C tjP , or some function of it, h p (C t , p ), plus Gaussian noise 
with small variance, a 2 bs (independent of noise in other 

sensors and at other times). If the sensor fails, Y tp is either 
independent of, or weakly dependent on C t . p (e.g. large 
variance Gaussian about Ct, p ). An alternative failure model 
is being some different function, h?, of C tyP plus noise. 
In all the above cases, the OL can be written as 

M J 

P(Yt\C t ) = [] l[p(Y$\C t , p ), where 

P(Y t { i } \C t , p ) - (1 - aW) tf(Y$; h p (C t , p ),a 2 obSiP ) 

+ a® p f (Y$\C t , p ) (24) 

We simulated two types of sensors h p (C t:P ) = C t . p (linear) 
and h p (C t ,p) — C 2 p (squared). Note that a squared sensor is 
an extreme example of possible sensing nonlinearities. First 
consider J = 1 (one sensor per node), h p (C t , p ) — C tjP ,Vp 
(all linear sensors), and PfiYt^Ct.p) — Pf(Y t [ p ) (when 
the sensor fails, the observation is independent of the true 
temperature). In this case, each OL term is a raised Gaussian 
(heavy-tailed) as a function of C t , p and so it is not strongly 
log-concave. For a given p, p*(Ct, p ) will be multimodal when 
Y^ p is "far" from the predicted temperature at this node 
and the STP is not narrow enough. This happens with high 
probability (w.h.p.) whenever the sensor fails. See Fig. 1(c). 
A similar model is also used in modeling clutter [10], [20]. 

Now consider J = 2, all linear sensors and Pf(Y} p |C i;P ) = 
Pf(Y} p ). Whenever one or both sensors at a node p fail, 
the observations Y t ^ o , Y^ 2 po will be "far" compared to a 2 bs 
w.h.p. In this case, the OL will be bimodal as a function 
of C tjPo since p{Y t ^ p \C t , p ) can be written as a sum of four 
terms: a product of Gaussians term (which is negligible), 
plus K\ + K 2 N{Y t { 2 ■ C t , Po , a 2 obs ) + K 3 JV(Y t (2) o ; C t 
where Ki,K 2 ,K 3 are constants w.r.t. C t . This is bimodal 
since the modes of the two Gaussians, Y t ^ Q , Y^ 2 pa , are "far". 
See Fig. 1(b). If no sensor at a node fails, both observations 
will be "close" w.h.p.. In this case all four terms have 
roughly the same mode, and thus the sum is unimodal. 
When pf(Y^ p \C ttP ) is weakly dependent on C t , p (e.g. a 
large variance Gaussian), ifi , i\~ 2 , if 3 are not constants but 
are slowly varying functions of C t . A similar argument applies 
there as well. 

(7) 

A squared sensor results in a bimodal OL whenever Y t \ p 
is significantly positive. Squared sensor is one example of a 
many-to-one measurement function. Other examples include 
bearings-only tracking [3] and illumination tracking [12], [13]. 

Example 2 (Illumination/Motion Tracking): The illumina- 
tion and motion tracking model of [12], [13] can be rewritten 
in the form (20). In this case, the OL is often multimodal since 
the observation (image intensity) is a many-to-one function of 
the state (illumination, motion), but conditioned on motion, it 
is often unimodal. The STP of illumination is quite narrow. 

Example 3 (Contour Tracking, Landmark Shape Tracking): 



Two non-Euclidean space examples of the LDSS model (20) 
are (a) the contour tracking problems given in [11], [29], [17] 
and (b) the landmark shape tracking problem of [20], [10]. 

IV. Choosing the "Multimodal States" for LDSS 

In Sec.IV-A below, we apply Theorem 1 to the generic 
LDSS model, (20), and show an example of verifying its 
conditions. Practical ways to select X t . s are given in Sec.IV-B. 

A. Unimodality Result for LDSS model 

Consider a model of the form (20) with gc(V) = V. 
Assume that v t can be partitioned into v t = [vt, s ',Vt, r ] where 
v t .s denotes the temperature change coefficients along the 
"multimodal" directions of the state space and v t . r denotes 
the rest. Thus, X t . s — v t . s and X tjr — [vt,r,C t ]. Similarly 
partition B = [B s ,B r ], A„ = diag(A u ^ S} A^ r ) and v t — 
[vt,s]v t: r\. We discuss how to choose v t . s and v t . r in Sec. 
IV-B. The "multimodal" dimension, K = dim(u t s ) and 
M r = M - K. Denote 

ci±ci_ 1+ B s vi, fi±f v M-i) 

Then we have 

P**' l (v t , r ,C t ) 

= P(v t ,r , C t I V\_ l , C\_ ! , v{ s , Y t ) 

= C% r ;/;,A, r ) S(C t - [Ci + B r vt, r ]) p{Y t \C t ) 

= CA/Kr; fU ^r)p{Yt\C\ + B rVt ,r)S(C t - [Q + B r v t , r }) 

4 P*^M 6(C t - [Ct + B r v t , r }) (25) 

where S denotes the Dirac delta function and ( is a pro- 
portionality constant. Since C\ is a deterministic function 
of C(_i) v\ s' v t,r, its pdf is a Dirac delta function (which 
is trivially unimodal at CI + B r vt, r )- Thus for the purpose 
of importance sampling, X t:T = Vt,r only, and we need 
conditions to ensure that p**' % (v t: r) is unimodal. In this case, 
L l {v tt r) — ~ logp*"(vt,r) + const becomes 

M-K ,r -j, ^ 2 

L i (v t , r )±E Yt (& t +B r v t , r )+ ]T ([Vt ; A UM (26) 

P =l ' ' y 

Applying Corollary 1 we get, 

Corollary 3: Consider model (20) with gc{V) = V- Corol- 
lary 1 applies with the following substitutions: Xt. s = «t, s , 

X t , r = Vt.r, M r = M - K, A r , p = A^ r , P , P r = /«, r («{_i), 

X7E Yt = BjW c E{C l t +B r v t ,r), H LC C R M - K is the largest 
convex region in the neighborhood of /* where Ey t (C\ + 
B r vt,r) is convex as a function of v t . r - 

We demonstrate how to verify the conditions of Corollary 
3 using a temperature tracking example. We use numerical 
(finite difference) computations of gradients. Here eo needs 
to be chosen carefully, depending on the resolution of the 
discretization grid of v t . r - It should be just large enough 1 so 
that one does not miss any stationary point of Ey t ■ 

'If eo is too small, [V-Eyjj, may transition from a value smaller than — eo 
to a value larger than +eo (or vice versa) over one grid point, and this region 
will not be included in Z p (even if [VD] P has the same sign as [VEyilp), 
thus getting a wrong (too large) value of A*. If eo is larger than required, 
the region Z p may be bigger than needed, thus giving a smaller value A* 
than what can actually be allowed. 
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(f) (g) (VEy-Jp = 0,p = 1,2 (h)(VL) p = 0, A r =0.9A* (i) (VL) P = 0, A r = 1.1A* 

E Yt (vt,r) 

Fig. 2. Computing A* for Example 4. We used a w = [0.1,0.1,0.1], q (2) = [0.4,0.4,0.4], pf{Y t ^) = Unif{-10, 10), j = 1,2, Vp, 
a 2 obs = [1,1,1], A„,i = 5.4, B = [-0.27,0.96,-0.02]'; [0.33,0.11,0.94]'; [0.90,0.24-0.35]' (we use MATLAB notation). Also, C\_ x = 
[0,0,0]', u t l _i,r = [0 0]', «l_i, s = 0, F^i' 2) = [5.36,0.59], F t C |' 2) = [-2.25,-1.60] F t ( j' 2) = [-0.68,0.35] and < s = -3.2 (simulated 
from jV"(0, A„,i)). Fig. 2(a): region IZlc, and the point /* = vl-i,r which lies inside it. Fig. 2(b), 2(c), 2(d), 2(e): the regions, Ai O A2, 
Z\ n A2, Zi n Ai and 2i n Z2, along with the computed minimum value of max p 7 P (vt,r) in the 4 regions (1.79, 1642.6, 403.7, 4771.4). 
The final value of A* is the minimum of these four values, i.e. A* = 1.79. Fig 2(f): mesh plot of Ey t as a function of vt, r - Note the 2 
dominant modes. Fig 2(g): contours of [V-Ey t ]i = and [V-Ey t ] 2 = (obtained using the contour command to find the zero level set of 
[VE Yt ]j,j = 1, 2). The contours have many points of intersection (points where VEy t = 0), i.e. many stationary points. Fig 2(h): contours 
of [VL]i = and [VL] 2 — for L computed with A„ l2 = A„,3 = 0.9A*. The contours have only one point of intersection which is a 
minimum. Fig 2(i): contours of of [VL]j = 0, j = 1,2 for A„ i2 = A„,3 = 1.1A*. There are 3 intersection points (2 are minima). 



Example 4: Consider Example 1. Assume that M = 3 and 
OL follows (24) with h p (Ct, p ) = Ct. p (linear sensors) and 
Pf(Yt,p\Ct, P ) = Pf(Y t b p } ). Also, let a - 1. In Fig. 2, we 
demonstrate how to verify the conditions of Corollary 3. Let 
K = 1, i.e M r = 2. Assume that X t}S = «t,s = Vt.i and 
Vt, r = Vt,2:3- Assume a given value of C\_y, p r and of Y t 
(given in the figure caption). Note that Y$ — 5.36, Y^f = 
0.59 are "far" compared to a b s .i — 1 and hence the OL 
is multimodal. Fig. 2(f) plots EY t (vt, r )- Fig. 2(g) plots the 
contours of [WEy t ] p = 0,p = 1,2 (the points where the red 
and blue contours intersect are the stationary points of Ey t )- 

Verification of condition 2 is shown in Fig. 2(a). Next, 
we show the steps for computing A*. For M r = 2, Q = 
Hp =1 (^4 p U Z p ) is a subset of M 2 and is a union of the 4 
regions: A\ n Ai, Z\ n A%, A\ n Z 2 , Z\ n Z 2 , shown in Fig 
2(b), 2(c), 2(d), 2(e). The computed value of the minimum of 
max,, r )p Um {vt : r) in each region is also given in the titles. 
The final A* = 1.79 is the minimum of these 4 values. 
Contours of [VL']i = and of [VL 1 ^ = computed for 
A„ : 2 = A„ :3 = 0.9A* and 1.1A* are shown in Figs. 2(h), 
2(i). Notice that when A^.2 = A„ f 3 = 0.9A*, they intersect at 
only one point i.e. WL l = at only one point (one stationary 
point). When A„ 2 = A^ 3 = 1.1A*, there are 3 stationary 
points (and 2 are minima). 

B. Choosing the "Multimodal" States, X t , s 

Corollary 3 gives a unimodality condition that needs to be 
verified separately for each particle and each Y t at each t. An 
exact algorithm to do this would be to begin by checking 



at each t, for each i, if Theorem 1 holds with K = 0. 
Keep increasing K and doing this until find a K for which 
Corollary 3 holds conditioned on X\ 1 . K . This can be done 
efficiently only if A* can be computed analytically or using 
some efficient numerical techniques. That will be the focus of 
future research. But, as discussed earlier, PF-EIS works even 
if unimodality of p**' l (X t . r ) holds for most particles at most 
times, i.e. it holds w.h.p. 

We use the temperature tracking problem of Example 1 to 
explain how to choose X t , s . For a given K, we would like to 
choose X t . s = Vt.s that makes it most likely for p** ,! (w t r ) to 
be unimodal. Given X\_ x , v\ s , v t , r is a linear function of C*. 
If Vt, r were also a one-to-one function of Ct, then one could 
equivalently find conditions for unimodality of p**' l (Ct), 
which is easier to analyze. For an approximate analysis, we 
make it a one-to-one function of Ct by adding a very small 
variance (compared to that of any Vt,p) noise, n tyS , along B s , 
i.e. given Xl_ x ,v\ ;S , set C t = Ct_i+-B s u| jS +B r w*,r+-B a nt,8. 
Now, Ct is a one-to-one and linear function of [«t r , n t ^ s ]. This 
also makes p**' l {Ct) a non-degenerate pdf. 

First consider the case where w.h.p. OL can be multimodal 
as a function of temperature at only one node po, for e.g., 
hp(C t , p ) = C t , p , Vp 7^ po, ap = 0, Vp 7^ po, and either 
a p() > or h po (C t , Po ) is many-to-one. Then, 

P**' l {C t ) = Cp{Yt,po\Ct, P MCt,p \XU,vl s ) x 

[ II p(Yt, P \Ct, P )}p(C t \Ct, P0 ,XU,vl s ) (27) 

P^Po 



s 



and the last two terms above are Gaussian (and hence 
strongly log-concave) as a function of C t . p , p ^ po- If 
P**' l (Ct, Po ) is also strongly log-concave then p**- l (C t ) (and 
hence p**' l (vt t r)) will be strongly log-concave, and hence 
unimodal. Now, p**' l (Ct. Po ) will be strongly log-concave if 
3 e > such that A c , p „ = Var^Ct^Xl^vl )] < 



inf 



^■■ V2 c t , Pn E n(C t )<0} | V j. E Yt (C t )\+e - 



This bound can 

J t,P " - • I c t , Pa r 

only be computed on the fly. A-priori, P**' l (Ct, Po ) will be 
most likely to be log-concave if v t , s is chosen to ensure 
that Ac, Po is smallest. Let v t , s = v t ^ where the set fc 
contains K elements out of [1, . . . M] and K is fixed. Then, 
Ac, Po = J2k<£k B l a ,k^v,k- Tnis ignores the variance of n t . s 
(valid since the variance is assumed very small compared to 
all A^p's). Thus, Ac jPo will be smallest if v t . s is chosen as 



v t .s = v t , ks , k s = argmin ^ B p ,j A »,k 



(28) 



kikn 



When K = 1, this is equivalent to choosing k s = 
argmaxfe B 2 ()k A v ^- Based on the above discussion, we have 
the following heuristics. 

Heuristic 1: If OL can be multimodal as a function of 
temperature at only a single node, p n , and is unimodal as a 
function of temperature at other nodes, select v t . s using (28). 

Heuristic 2: If OL is much more likely to be multimodal 
as a function of C t . Pa , compared to temperature at any other 
node (e.g. if a sensor at po is old so that its failure probability 
is much larger than the rest), apply Heuristic 1 to that po. 

Heuristic 3: When p is a set (not a single index), Heuristic 
1 can be extended to select k s to minimize the spectral radius 
(maximum eigenvalue) of the matrix, 2~2k<jtk B Po,kBp o k A„ tk . 

Heuristic 4: If OL is equally likely to be multimodal as 
a function of any C t . p (e.g. if all sensors have equal failure 
probability), then p = [1, . . . M]. Applying Heuristic 3, one 
would select the K largest variance directions of STP as i> tiS . 

Heuristic 5: If the probability of OL being multimodal is 
itself very small, then K = can be used. In Example 1 with 
all linear sensors, this probability is roughly 1 — JJ . (1 — a p ). 

Heuristic 6: For J = 2 and all linear sensors, po may be 
chosen on-the-fly as argmax p [(Y t y - Y^J ) 2 /<y 2 obsp ] (larger 
the difference, the more likely it is for OL to be multimodal 
at that p). If the maximum itself is small, set K = 0. 

We show an example now. Consider Example 4 with 
a (i) = a (2) = [0.4,0.01,0.01], = dmg([10,5,5]), B = 
[0.95, 0.21, 0.21]'; [-0.21, 0.98, -0.05]'; [-0.22, 0, 0.98]' (us- 
ing MATLAB notation). By Heuristic 5, the probability of 
OL being multimodal is about 0.65 which is not small. So 
we choose K > (K = 1). By Heuristic 2, we choose 
Po = 1 since OL is multimodal as a function of C t ,\ with 
probability 0.64, while that for Ct.i or Ct.z together is 0.02 
(much smaller). Applying (28) for po — 1, we get vt. s = «t,i. 

V. PF-EIS-MT: PF-EIS with Mode Tracker 

For any PF (including efficient PFs such as PF-EIS or 
PF-Doucet), the effective particle size [4], [6] reduces with 
increasing dimension, i.e. the N required for a given track- 
ing accuracy increases with dimension. This makes all PFs 
impractically expensive for LDSS problems. We discuss one 
possible solution to this problem here. 



A. PF-EIS-MT and PF-MT Algorithm 

Consider the LDSS model (20). To apply PF-EIS, we split 
the state X t into [X t . s , X t . r ], such that p* is unimodal w.h.p. 
conditioned on X t . s - As explained earlier, this is ensured 
if the eigenvalues of S r are small enough to satisfy (19). 
Now, because of the LDSS property, X tjr can further be split 
into [X t ,r,s, X t . r ,r} so that the maximum eigenvalue of the 
covariance of the STP of X t . r . r is small enough to ensure that 
there is little error in approximating the conditional posterior 
of X t .r,r by a Dirac delta function at its mode. We call 
this the Mode Tracking (MT) approximation of importance 
sampling (IS), or IS-MT. We refer to X t . s = [X t , s , X t , r ,s} as 
the "effective" state and to X tfT — -X^ r _ r as the "residual" 
state. We explain IS-MT in detail below. 

In PF-EIS, we IS X\ s from its STP, and we EIS X\ r from 



where m 
and Yj\ s 



first sampling X\ r s 



X' irir ~JVKy,£* Sir ) where 



t , T,\ s are defined in (2). Let ml 

^IS,s ^IS,s,r 

Af(m l t s , £j S J and then sampling 



This is equivalent to 



m; 



IS,r,s^IS,s 



m; 



IS,r — L '« - ~ ^ 



IS,r 



J IS,r,s^IS,s 
< 



J IS,r,s 



(29) 



Now, from (29), Y>* ISr l < T,} Sr . Also, since m\ 
lies in a locally convex region of Ey t {X\ s , X t . r ), i.e. 
V 2 E Yt {X l ts ,m\) > (by Theorem 1), £} s < A r . This 



implies that A r r — £} s r , which is a square sub-matrix of 
A r — is also non-negative definite. Thus, 



J IS,'. 



< E 



is. 



< A r 



(30) 



If the maximum eigenvalue of A r ^ r is small enough, any 
sample from J\f(m^ r \Y,* ISr l ) will be close to m* t r % w.h.p. 
So we can set X\ rr = ml r l with little extra error (quantified 
in the next subsection). The algorithm is then called PF-EIS- 
MT. It is summarized in Algorithm 3. A more accurate, but also 
more expensive modification(need to implement it on-the-fly) 
would be do MT on the low eigenvalue directions of £} s . A 
simpler, but sometimes less accurate, modification is PF-MT 
(summarized in Algorithm 4). In PF-MT, we combine X tt r,s 
with X tiS and importance sample the combined state X tjS — 
[X t .s, X t , r ,s] from its STP (or in some cases X t , r ,s is empty), 
while performing mode tracking (MT) on X t . r = X t . r . r . 

The IS-MT approximation introduces some error in the 
estimate of X t . r , r (error decreases with decreasing spread of 
p**' 1 (Xt.r.r))- But it also reduces the sampling dimension from 
dim(X t ) to dim([X t . s ; X t , r ,s]) (significant reduction for large 
dimensional problems), thus improving the effective particle 
size. For carefully chosen dimension of X t . r ,r, this results in 
smaller total error, especially when the available number of 
particles, N, is small. This is observed experimentally, but 
proving it theoretically is an open problem. We say that the 
IS-MT approximation is "valid" for a given choice of X t . r .r 
if it results in smaller total error than if it were not used. 

B. IS-MT Approximation 

We quantify the error due to IS-MT. If we did not use the 
MT approximation, X\ r r - M{m* t r \ E} S r 4 ). But using MT, 
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Algorithm 3 PF-EIS-MT. Going from jr£i to n?{X t ) = £f =1 w^S(X t - XI), X\ = [X t *„Xj, r ]. X|, r = [X| ir , s , X\^ r \ 



l). 



1) Importance Sample X t . s : Vi, sample 

2) Efficient Importance Sample X t , r ,s- Vi, 

a) Compute m\(Xl_ 1 , X\ , Y t ) = argminx t r L l (X t , r ) and £} s = (V 2 I/(mJ)) _1 where L' is defined in (7). Let 



"t.s 



b) Sample X. 



t.r.s 



and £} s 



)• 



3) Mode Track X t , r ,r-' Vi, 

a) Compute m£/ using (29). 

b) Set X| iT . r = m* / 

4) Weight: Vz, compute = 

5) Resample. Set £ 



where wl 



t sr N 
t+1 and go to step 1 . 



w 



t-i 



A/pf|~ mj, S} s ) WIlere ~ i^t,r,si ^t,r,r\- 



Algorithm 4 PF-MT. Going from to ir?{X t ) = 



1) Importance Sample X t . s : Vi, sample A t J s <~ s |-2Q_i). 

2) Mode Track X t , r : Vz, set X\ r — m\ where ml(Xl_ 1 , Xl s , Yt) = argmin^ L l {X t .r) and L' 1 is defined in (7). 

3) Weight: Mi, compute w\ = where w\ = w\_ xV {Y t \X^ V {Xl r \X\_ x ,Xl s ) where X\ = 

4) Resample. Set t <— t + 1 & go to step 1. 



we set X, 

E 



t r r — TOj r . Let the eigenvalue decomposition of 



UA* ISr l U T and let A p ^ (A} s r ' 



eigenvalue. Let d = X\ r r - 

probability of ||cf|| = ||C/ T (i|| > e using Chernoff bounding: 



ml r \ For an e > 0, we bound the 



Pr(\\d\\>e) = Pr(\\U T d\\>e) 

= p r ( e fT, P (u T d)l > e 

<ni(i-2A pS )- i / 2 e 

p 

< [ (1 - 2A m s, 

< [ (1 - 2A m s)- 1 / 2 e- se2 / Ml 



) 

se 2 /M r , r ) 



l/2 e -se 2 /M r:r ) 



M r ,. 

) lM r 



(31) 
(32) 

where s > 0, A m = max p A p and A m = max p 
first inequality follows by applying Markov inequality, the 
second follows because A p < A TO , Vp and (32) follows because 
A m < A m which follows from (30). Now, (32) holds for any 
s > and thus 

Pr(\\d\ \ > e) < [min{(l - 2A m s)- 1 / 2 e - se ^ M ^)}} M ^ 



s>0 



M r , r /2 



^5(A m ,e) (33) 



Rewriting [B(A TO ,e 



12/M r , r _ 



= ( 



A/,. 



0-Vc 



-1) 



and 

= 0. 



applying L'Hospital's rule, we get limA m -^o B(A m , e) 
Note that, if instead of (32), we applied min s >o to (31), we 
would get Pr(\\d\\ > e) < B(X m ,e). Thus, 

Theorem 2: Consider any HMM model (satisfying As- 
sumption 1) and assume that the conditions of Theorem 1 hold. 
Let X\ r T ~ N{m* t r \ T,* IS J). Then lim A 
mi J\\ > e) = and also lirm^o Pr(\ I 



m ^Pr(\\X^ r - 

x\^ r -mi;\\ > 



e) = 0, i.e. X\ rr converges in probability to m* t r l in the 
Euclidean norm as A TO = max p A rir;P — > and also as 

A m = max p (A| Sr 4 ) PiP -> 0. 



Remark 1: Even if the conditions of Theorem 1 do not hold 
(inequality (30) does not hold), we can still prove Theorem 2 if 
we assume that Y>* IS r l = Covar[p**' l {X t , r , r )] (actually Y>) s 
is only an approximation to Covar[p**- 1 (X t , r ,r)])- The result 
will then follow by using the conditional variance identity [30, 
Theorem 4.4.7] to show that E Yt [^is,r} < A r,r- 

In summary, PF-EIS-MT can be used if p**' l (X ttr ) is 
unimodal w.h.p. and the largest eigenvalue of T,* IS r l is small 
enough to ensure the validity of IS-MT. A sufficient condition 
is that the largest eigenvalue of A r , r be small enough. The 
choice of e is governed by the tradeoff between the increase 
in error due to IS-MT and the decrease due to reduced IS 
dimension. This will be studied in future work. 



C. Choosing the MT-Residual states, X t , T , r 

We first choose an X t , s ,X t , r for the EIS step using the 
unimodality heuristics discussed earlier in Sec. IV-B. Then 
we split X t ,r into X t , r ,s and X t , r ,r so that IS-MT is valid 
for X t , r ,r- Then PF-EIS-MT can be implemented with the 
chosen X ttS , X t . r . s , X t . r . r . Alternatively, one can implement 
PF-MT (faster) with X t ' s = [X t ,,; X t , r ,s], X t>r = X t ^ r . For 
a given value of e, e 2 , two approaches can be used to choose 
X t , r ,r- The first is offline and finds the largest A m so that 
B(A m ,e) < e 2 . The second is online, i.e. at each t, for each 
particle i, it finds A m so that B(X m ,e) < e 2 . 

Heuristic 7: Begin with M r r = M r and keep reducing its 
value. For each value of M r , r , choose the states with the M r>r 
smallest values of A„ ;T . iP (so that max p A^ r ^ rjP is smallest) as 
Xt, r ,r- With this choice, compute B(A m , e) and check if it is 
smaller than e 2 . If it is smaller, then stop, else reduce M r>r 
by 1 and repeat the same steps. A second approach is to do 
the same thing on-the-fly, using B(\ m , e). 
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D. Connection with Rao-Blackwellized PF (RB-PF) 

We first discuss the connection of PF-MT to RB-PF. PF-MT 
can be interpreted as an approximation of the RB-PF of [9]. 
The RBPF of [9] is applicable when the state vector can be 
split as X t = [X t ,ni,X t ,i] with the following property: X t , n i 
has any general nonlinear or non-Gaussian state space model; 
but conditioned on Xi :t . n i, X t ,i has a linear Gaussian state 
space model. Thus the RB-PF of [9] importance samples X t} ni 
from its STP but applies the Kalman recursion to compute 
the conditional prediction and posterior densities (both are 
Gaussian) of X t .i conditioned on each particle X\. t n[ . The 
OL of each particle X\. t n[ , is computed by marginalizing over 
the prediction density of X t j. 

PF-MT can be understood as an approximation to the RB-PF 
in the following sense: replace the "nonlinear" part of the state 
space by X t . s , i.e. X t . n i = X t , s , and the "linear" part by X t . r , 
i.e. Xf t i = X t .r- In PF-MT, the conditional prediction and 
posterior densities of X t , r (conditioned on X\. t s ) are assumed 
to be unimodal (not necessarily Gaussian), but narrow. In 
general, it is not possible to marginalize over any unimodal 
density. But if the product of the STP of X t , r and the OL 
given X\ s is narrow enough to be be approximated by its 
maximum value times a Dirac delta function at its unique 
maximizer, PF-MT can be interpreted as an RB-PF. In that 
case, the conditional posterior of X t<r is also approximated 
by a Dirac delta function. Thus, 

Theorem 3: PF-MT (Algorithm 4) is RB-PF (Algorithm 1 
of [9]) with the following approximation at each t: 

P {Y t \xi s ,x t , r ) P {x t AxU,xi s ) 

= piYtlXl^XlMKrlX^XUSiX^r - Xl r ) (34) 
Xl r =m\ = argmaxb^tlX^.Xt^MXt^lXtx.^J] 

X t ,r 

With the above approximation, the following also holds: 

p**>\X tir ) ^piXt^Xi^X^Yt) = 5{X tir -m\) (35) 

The proof is a simple exercise of simplifying RB-PF expres- 
sions using (34) and hence is omitted. 

For PF-EIS-MT, replace X tj1 . by X t . r , r and X t . s by 
[Xt, s ;Xt,r,s] m the above discussion. Also, importance sam- 
pling from the STP in case of RB-PF is replaced by EIS. 

VI. Relation to Existing Work 

We discuss here the relation of our algorithms to existing 
work. The problem of estimating temperature at a large num- 
ber of locations in a room using a network of sensors is also 
studied in [14], [15]. Their focus is on modeling the spatio- 
temporal temperature variation using an RC circuit, estimating 
its parameters, and using the model for predicting temperature 
at unknown nodes. They assume zero sensor failure probability 
and observation noise (usually valid when sensors are new) 
and hence do not require tracking. In a practical system, one 
can use [14] when sensors are new and reliable, but track 
the temperature using PF-EIS-MT (and the models estimated 
using [14]) when sensors grow older and unreliable. 

For multimodal OL or STP, if there are only a few modes 
at known mode locations, the Gaussian Sum PFs (GSPF-I or 



GSPF-II) of [31] can be used. All examples shown in [31] 
have a one dimensional process noise, and thus effectively a 
one dimensional state. As dimension increases, the number 
of mixands that need to be maintained by GSPF-I increases 
significantly. We compare PF-EIS with GSPF-I in Fig. 3. 
GSPF-II defines a mixand about each possible mode of OL or 
of STP, followed by resampling to prune insignificant modes. 
The possible number of OL modes increases with dimension, 
even though for a given observation, it is highly unlikely that 
all modes appear. For e.g., in case of tracking temperature at 
50 nodes with 2 sensors per node, each with nonzero failure 
probability, the maximum number of possible OL modes at any 
time is 2 50 . Another work that also approximates a multimodal 
pdf by a mixture density is [32]. 

The Independent Partition PF (IPPF) of [33] and the IPPF- 
JMPD of [34] propose efficient PFs for multiple target track- 
ing. There the motion model of different targets is independent, 
while the OL is coupled when the targets are nearby (because 
of correspondence ambiguity between observations and tar- 
gets). The main idea of IPPF is to resample independently 
for each target when the targets are significantly far apart 
(their OLs are roughly independent). In our work, and also 
in other LDSS problems, this cannot be done since the 
temperature (or other state) dynamics of different nodes is 
coupled (temperature change is spatially correlated). 

The main idea of MT was first introduced by us in [29] 
and first generalized in [2], [35], [1]. The work of [36] which 
proposes a "PF using gradient proposal" is related to [29]. The 
MT step can also be understood as Rao-Blackwellization [27], 
[9] if the approximation of Theorem 3 holds. Another recent 
PF that also performs approximate marginalization, but only 
on the stable directions of the state space, is [21]. This can 
be made more efficient by using the EIS idea on the unstable 
directions. Many existing algorithms may be interpreted as 
special cases of PF-EIS-MT, for e.g. PF-Original is PF-EIS- 
MT with X t<a = X t , PF-Doucet is PF-EIS-MT with X t ^ s = 
X t , and the approximate "posterior mode tracker" of [18] is 
approximately PF-EIS-MT with X t . T ^ — X t . 

There is a fundamental difference between MT and the 
commonly used idea of replacing the PF estimate of the 
posterior by a Dirac delta function at the highest weight 
particle (or at the mode of the PF posterior estimate), as in 
[17], or doing this for a subset of states, as in [37]. The latter 
can be understood as an extreme type of resampling which will 
automatically occur in any PF if the largest weight particle has 
much higher weight than any other particle. It still requires IS 
on the entire state space to first get the PF estimate of posterior. 

VII. Simulation Results 

We used Root Mean Squared Error (RMSE) of the PF 
approximation of the MMSE state estimate (from its true 
value) and percentage of out-of-track realizations to compare 
the performance of PF-EIS with that of PF-Original (PF-EIS 
with K = M) [3] and PF-Doucet (PF-EIS with K = 0) 
[6] in Fig. 3. The number of particles (N) was kept fixed 
for all PFs in a given comparison. We also show the RMSE 
plot of GSPF-I [31] with total number of particles (number 
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RMSE from ground truth. N=100 particles RMSE from ground truth. N=50 particles RMSE from ground truth. N-100 particles 




(a) Sensor failure (temperature independent) (b) Sensor failure (weakly temperature dependent) (c) Squared sensor at node 1 
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Fig. 3. Comparing RMSE, out-of-track % and N eff of PF-EIS (black-A) with that of PF-Doucet (red-*), PF-Orig (magenta-o) and GSPF-I 
(magenta -+). RMSE at time t is the square root of the mean of the squared error between the true Ct and the tracked one (TV-particle 
PF estimate of E[Ct | Yi ; t]). Out-of-track % is the percentage of realizations for which the norm of the squared error exceeds an in- track 
threshold (2-4 times of total observation noise variance). In-track threshold for Fig. 3(a) was 48, for Fig. 3(c) was 20 and for Fig. 3(b) was 
12. We averaged over 90 Monte Carlo simulations in Figs. 3(b) and 3(c) and over 40 in Fig. 3(a). Note Co refers to the starting value of Ct- 



of mixtures times number of particles per mixture) roughly 
equal to N. In Fig. 4, we show superior performance of PF- 
MT and PF-EIS -MT over PF-EIS, PF-Doucet, PF-Original and 
PF-Orig-/v-dim (dimension reduced original PF, i.e. original 
PF run on only the first K dimensions). 

Note that for multimodal posteriors, the RMSE at the 
current time does not tell us if all significant modes have been 
tracked or not. But, if a significant mode is missed, it will often 
result in larger errors in future state estimates, i.e. the error 
due to the missed mode will be captured in future RMSEs. In 
many problems, the goal of tracking is only to get an MMSE 
state estimate, and not necessarily view all the modes, and in 
these cases RMSE is still the correct performance measure. If a 
missed posterior mode does not result in larger future RMSEs, 
it does not affect performance in any way 2 . Of course, the 
increase in error due to a missed mode may occur at different 
time instants for different realizations and hence the average 
may not always truly reflect the loss in tracking performance. 

2 The true posterior is unknown. The only other way to evaluate if a PF 
is tracking all the modes at all times, is to run another PF with a very large 
number of particles and use its posterior estimate as the true one. 



Evaluating PF-EIS: We first explain a typical situation 
where PF-Doucet fails but PF-EIS does not. This occurs when 
the STP is broad and the OL is bimodal (or in general, 
multimodal) with modes that lie close to each other initially, 
but slowly drift apart. PF-Doucet uses gradient descent starting 
at C\_i to find the mode. When p* is multimodal, it approx- 
imates p* by a Gaussian about the mode in whose basin- 
of-attraction the previous particle (i.e. Cf^) lies. At t = 0, 
particles of Vt are generated from the initial state distribution 
and so there are some particles in the basin-of-attraction of 
both modes. But due to resampling, within a few time instants, 
often all particles cluster around one mode. If this happens to 
be the wrong mode, it results in loss of track. In contrast, PF- 
EIS samples vt }S from its STP, i.e. it generates new particles 
near both OL modes at each t, and so does not lose track. 

All plots of Fig. 3 simulated Example 1 with M = 3. Model 
parameters used for each sub figure are given in the table in Fig. 
3(d). The example of Fig. 3(a) is a special case of Example 
4. It has M = 3 sensor nodes; J = 2 sensors per node; 
all linear sensors and "temperature-independent failure", i.e. 
Pf(Y$\C ta> ) = p f (Y$) = Af(Y$; 0,100). Temperature 
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RMSE from ground truth. N=100 particles Out-of-track %. N-100 particles RMSE from ground truth. N=50 particles 




(a) Sensor failure (temperature independent) (b) Robustness to model error 
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(c) Table of parameters. The notation denotes a row vector of 6s of length k, e.g. lg. 



Fig. 4. Comparing PF-MT (blue-D) in 4(a) and PF-EIS-MT (blue-+) in 4(b) with PF-Doucet (red-*), PF-EIS (black-A), PF-Orig (magenta-o) 
and PF-Orig-K dim (magenta-x). In Fig. 4(a), M — 10 was used. X t>s — Vt t i was used for both PF-EIS and PF-MT. Averaged over 50 
simulations. PF-MT has best performance. In Fig. 4(b), we test the robustness to error in the failure probability parameter. M = 5 was used. 
We used X t>s — Vt i, X t . r ,s ~ Vt,2 for PF-EIS-MT. Xt, s = v t,i was used for PF-EIS. Averaged over 100 simulations. PF-EIS-MT is the 
most robust when TV = 50 particles were used (available TV is small). If N = 100 particles are used, PF-EIS is the most robust (not shown). 



change followed a random walk model, i.e. a = 1. By 
Heuristic 2, we choose p Q = 1 since OL is multimodal 
as a function of Ct,x with much higher probability than at 
other nodes (we simulate an extreme case). Applying (28) 
for po = 1, we get v t . s = Vt \. This was used for PF-EIS. 
As can be seen, RMSE for PF-EIS was smaller than for PF- 
Doucet and so were the number of "out of track" realizations. 
GSPF-I [31] with G = 8 mixtures and Ng = 7 particles 
per mixture (a total of 56 particles) and PF-Original had 
much worse performance for reasons explained earlier (used 
inefficient importance densities). 

In Fig. 3(b), we simulated "weakly temperature 
dependent sensor failure", i.e. pf(Y^\Ct^ p ) = 
Af(Y t ^; 0.2C tlP , lOOcr^p). Also, sensor failure probability 
at node 1 was lower than in Fig. 3(a). Thus the performance 
of all algorithms is better. 

Fig. 3(c) used J = 1 sensor per node and a squared sensor 
at node 1, i.e. h(Ct) = [Cf^, Ct t i\ Ct^]. All sensors had 
zero failure probability, i.e. = 0,Vp. Temperature change 
followed a first order autoregressive model 3 with a = 0.7. 
In this case OL is bimodal as a function of Ct.\ whenever 
Yt i is significantly positive. This happens w.h.p when tem- 
peratures are greater than ^jZa bs.\ — 2-3 (or less than —2.3) 
which itself happens very often. Also, often, the modes are 
initially nearby and slowly drift apart as the magnitude of Y tt \ 
increases. As explained earlier, this is just the situation that 
results in failure of PF-Doucet. Performance of PF-Doucet is 
significantly worse than that of PF-EIS (which used v t , s — Vt,i 
obtained by applying (28) for po = 1). Note that we initiated 
tracking with an initial known temperature of 5, so that there 
was a bias towards positive temperature values and it was 

3 This example is a difficult one because OL is almost always bimodal with 
two equal modes. With a random walk model on Vt, even N = 100 particles 
were not enough for accurate tracking using any PR 



indeed possible to correctly track the temperature and its sign. 

Using an anonymous reviewer's suggestion, we also plot the 
effective particle size, N e ff, for all the above examples in Fig. 
5. N e ff is equal to the inverse of the variance of normalized 
particle weights [4]. Because of resampling at each t, N e ft 
only measures the effectiveness of the current particles, and not 
how they influence the future posterior estimates. N e ff will 
be high even when most particles cluster around an OL mode 
which in future turns out to be the wrong one, resulting in 
larger future RMSEs. This is why PF-Doucet, which samples 
from the Laplace approximation to the "optimal" importance 
density (optimal in the sense of minimizing the conditional 
weights' variance) has the highest N e ff, but not the smallest 
RMSE. This issue is most obvious for the squared sensor case. 

Time Comparison. We used the MATLAB profiler to 
compare the times taken by different PFs for tracking for 20 
time steps. GSPF-I took 1 second, PF-Original took 2 seconds, 
PF-EIS took 60.2 seconds, and PF-Doucet took 1 1 1.2 seconds. 
GSPF-I and PF-Original took significantly lesser time since 
they do not use gradient descent at all. Note also that the 
gradient descent algorithm used by us was a very basic and 
slow implementation using the fminunc function in MATLAB, 
thus making PF-EIS or PF-Doucet more slower than they 
would actually be. PF-Doucet takes more time than PF-EIS 
because (a) it finds the mode on an M dimensional space, 
while PF-EIS finds mode only on an M— K dimensional space 
and (b) p* is very likely to be multimodal (many times the 
initial guess particle may not lie in the basin-of-attraction of 
any mode and so many more descent iterations are required). 

Evaluating PF-MT and PF-EIS-MT: In Fig. 4, we compare 
the performance of PF-MT and PF-EIS-MT with other PFs. 
The model of Fig. 4(a) was similar to that of Fig. 3(a), but 
with 71/ = 10. We used X t . s = Vt,i, X t . r . s = empty and 
Xt, r ,r = 1^2:10, m i s was a PF-MT with X t .s = V t .l 
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Time, t Time, t Time, t time, t 

(a) N eff for Fig. 3(a) (b) N eff for Fig. 3(b) (c) N eff for Fig. 3(c) (d) N eff for Fig. 4(a) 

Fig. 5. Effective particle sizes (N e ff). Because of resampling at each t, N e ff only measures the effectiveness of the current particles, and 
not how they influence future posterior estimates. It is high even when most particles cluster around an OL mode which in future turns out 
to be the wrong one, resulting in larger future RMSEs. PF-Doucet has highest TV e //, but not lowest RMSE or out-of-track % (see Fig. 3). 



and X t , r = Vt,2:ia- As can be seen from the figure, PF- 
MT outperforms all other algorithms. It outperforms PF-EIS 
because it importance samples only on a K = 1 dim space, but 
performs MT on the other 9 dimensions (which have a narrow 
enough conditional posterior) and so its effective particle size 
is much higher (see Fig. 5(d)). This is particularly important 
when the available TV is small. PF-MT outperforms PF-Doucet 
primarily because of the EIS step (approximated by MT). It is 
much better than PF-Original again because of better effective 
particle size (result of using EIS instead of IS from STP). 
Finally, it is significantly better than PF-K-dim because PF- 
K-dim performs dimension reduction on 9 states (all of which 
are nonstationary) which results in very large error, while PF- 
MT tracks the posterior mode on all these dimensions. Note 
that because of resampling, N e ff may also be very high when 
a PF is completely out-of-track (all particles have very low but 
roughly equal weights). This is true for PF-K-dim (Fig. 5(d)). 

In Fig. 4(b), we evaluate robustness to modeling error 
in sensor failure probability. The tracker assumed failure 
probability = 0.2. The observations were simulated using 
= 0.95. This simulates the situation where a sensor begins 
to fail much more often due to some sudden damage to it. For 
this problem, M = 5. We used X t , s — Vt,i, ^t,r,s — Vt,2 and 
Xt trtr = v t ,3:io i-e. we implemented PF-EIS-MT. PF-EIS-MT 
has the best performance when TV = 50 (available number 
of particles is small) while PF-EIS has the best performance 
when a larger TV, TV — 100 is used (not shown). 

Note that M — 5 or 10 is a large enough dimensional state 
space if reasonable accuracy is desired with as low as TV = 50 
or 100 particles. In practical scenarios (which are difficult to 
run multiple Monte Carlo runs of) such as contour tracking 
[29], [11] or tracking temperature in a wide area with large 
number of sensors, the state dimension can be as large as 100 
or 200 while one cannot use enough particles to importance 
sample on all dimensions. The IS-MT approximation will be 
really useful for such types of problems. 

VIII. Discussion and Future Work 

We have studied efficient importance sampling techniques 
for PF when the observation likelihood (OL) is frequently 
multimodal or heavy-tailed and the state transition pdf (STP) 
is broad and/or multimodal. The proposed PF-EIS algorithm 



generalizes Doucet's idea of sampling from a Gaussian ap- 
proximation to the optimal importance density, p* , when p* is 
unimodal, to the case of multimodal p*. 

Sufficient conditions to ensure unimodality ofp* conditioned 
on the "multimodal states", X t ^ s , are derived in Theorem 1. 
Theorem 1 can be extended to test for unimodality of any 
posterior. Specifically, it can also be extended to problems 
involving static posterior importance sampling. In its current 
form, it is very expensive to verify the conditions of Theorem 
1. But, based on it, multiple heuristics to choose X t , s to ensure 
that p* conditioned on X t , s is most likely to be unimodal 
have been proposed. An unsolved research issue is to either 
find efficient numerical techniques to verify the conditions of 
Theorem 1 on-the-fly or to find ways to modify the result so 
that the selection can be done a-priori. 

We have shown through extensive simulations that PF-EIS 
outperforms PF-Doucet (PF-EIS with K = 0) whenever p* 
is frequently multimodal. But, in other cases, PF-Doucet has 
lower error. An efficient algorithm (in terms of the required 
TV) would be to choose the dimension and direction of X t . s 
on-the-fly using Heuristic 6. 

Increasing N for any PF increases its computational cost. 
Once X t , s is large enough to satisfy unimodality w.h.p., the 
TV required for a given error increases as dimension of Xt jS 
is increased further (for e.g., PF-Original had much higher 
RMSE than PF-EIS for given TV). But, computational cost 
per particle always reduces as dimension of X t , s is increased 
(for e.g. PF-Original took much lesser time than PF-EIS 
which took lesser time than PF-Doucet). For a given tracking 
performance, if one had to choose X t , s to ensure minimal 
computational complexity, then the optimal choice will be a 
higher dimensional X ttS than what is required to just satisfy 
unimodality. Finding a systematic way to do this is an open 
problem. On the other hand, if the goal was to find a PF 
with minimal storage complexity or to find a PF that uses 
the smallest number of parallel hardware units (in case of a 
parallel implementation), the complexity is proportional to TV. 
In this case, PF-EIS (or PF-EIS-MT) with smallest possible 
"multimodal state" dimension would be the best technique. 

As state space dimension increases, the effective particle 
size reduces (variance of weights increases), thus making 
any regular PF impractical for large dimensional tracking 
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problems. The posterior Mode Tracking (MT) approximation 
to importance sampling (IS) for the states whose conditional 
posterior is narrow enough, is one way to tackle this issue. The 
IS-MT approximation introduces some error in the estimation 
of these states, but at the same time, it also reduces the sam- 
pling dimension by a large amount, thus improving effective 
particle size. For carefully chosen IS-MT directions, the net 
effect is smaller total error, especially when the available N 
is small. An open issue is to find rigorous techniques to select 
the IS-MT directions to ensure maximum reduction in error. A 
related issue is to study the stability of PF-MT or PF-EIS-MT, 
i.e. to show that the increase in PF error due to the IS-MT 
approximation at a certain time to goes to zero with t fast 
enough and thus the net error due to IS-MT at all times is 
bounded. A related work is [38] which analyzes the RB-PF. 
An interesting open question is if Compressed Sensing [39] 
can be used to select the IS-MT directions and when. 
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